Analysis of Micromobility Dock Deployment Strategies

Author
Affiliation

Joshua Sweren

Georgetown University

Code
#install.packages(c("osmdata", "sf", "dplyr", "ggplot2", "rnaturalearth"))
library(osmdata)
Code
library(sf)
Code
library(dplyr)
Code
library(ggplot2)
library(rnaturalearth)
library(osmextract)
Code
library(tigris)
Code
library(mapview)
library(lehdr)
library(viridis)
Code
library(viridisLite)
library(spatstat)
Code
library(units)
Code
library(spdep)
Code
library(terra)
Code
library(raster)
Code
library(tictoc)
Code
library(tidycensus)
library(leaflet)
Source: Article Notebook

Github repository: https://github.com/jsweren1/gis-final-project

Introduction

Over the last decade, bikeshare systems have become increasingly prevalent in cities around the world in recent years, offering residents and visitors a convenient, low-cost, and environmentally friendly transportation option. However, the spatial deployment of these systems has significant implications over which communities benefit the most and the extent to which they can be of service. This project will analyze the geographic distribution of micromobility stations in several metropolitan areas and evaluate how access to these low-cost transportation options varies across neighborhoods.

As of October 2025, there are five metropolitan areas in the U.S. whose docked bikeshare ridership far exceeds the rest: New York City, Boston, San Francisco, Chicago, and Washington, DC. These cities’ bikeshare systems each boasted ridership of over 400,000 in October 2025, while the sixth-place city registered under 100,000.1 These numbers are similarly disparate in other months. Thus, this project will focus on five cities with proven commitment to developing a system with geographic breadth. Unsurprisingly, these cities also have existing robust public transit networks, so this project will not only look at spatial deployment in relation to population, but also in relation to existing amenities for certain residents. Using OpenStreetMap data to locate bikeshare docks and other transit hubs, along with U.S. Census data to map population density, we will compare how bikeshare has been spatially deployed in its most prominent cases.

Literature Review

The following articles were provided important context on what research has already been done regarding this topic:

  1. Optimal locations for bikeshare stations: A new GIS based spatial approach2
  2. Using a GIS-based spatial approach to determine the optimal locations of bikeshare stations: The case of Washington D.C3

These articles analyzed bikeshare dock location strategies in Baltimore and Washington, D.C., respectively. The authors studying Baltimore focused mainly on optimizing usage of these docks and understanding what factors lead to greater ridership. They found that bikeshare usage depended heavily on the presence of neighboring docks, meaning clustered docks tended to benefit people more than isolated ones. In the case of Baltimore, the bikeshare system is significantly less thorough and mature than the five cities we will be studying, so the effects of isolated bikeshare docks were perhaps more apparent than we would find. This indicates that it will be important to study the clustering of docks as a univariate test.

The authors studying Washington, D.C. also focused on optimal locations, mainly as they relate to proximity to bus stops and train stations. For cities with existing public transit infrastructure, bikeshare can be seen as a supplement to heavily used services, not only as its own form of transportation. However, this study also emphasized the importance of bikeshare in “low-density residential neighborhoods where transit network coverage might be low”. This highlights a need for us to look at spatial equity in micromobility infrastructure, as bikeshare systems can provide greater accessibility to cheaper transit options.

Methodology

Each city will be analyzed via the following sets of hypotheses:

  1. Docks in relation to population density:

    • Null hypothesis: There is no relationship between the spatial distribution of bikeshare docks and population density. Dock locations occur independently of areas with higher or lower population density.
    • Alternative hypothesis: There is a relationship between the spatial distribution of Capital Bikeshare docks and population density. Dock locations are more likely to occur in areas with higher population density.
  2. Docks in relation to other public transit options:

    • Null hypothesis: There is no relationship between the spatial distribution of bikeshare docks and the presence of other public transit nodes.

    • Alternative hypothesis: There is a relationship between the spatial distribution of bikeshare docks and the presence of other public transit nodes. Dock locations are more likely to be placed in areas with existing public transit infrastructure.

The first steps towards testing these hypotheses will be to gather bikeshare dock and train station locations from OpenStreetMap via the osm package in R and population density data from the U.S. Census via the tidycensus package. To create a geographic region for each city, we will create a concave hull around all of the bikeshare docks. Since metropolitan areas differ in the distance from the city’s center that bikeshare systems reach, this will allow us to only consider areas within the chosen scope of each system, and how docks have been deployed within those limitations.

The following information will then be gathered to test the hypotheses:

  • Moran’s I to generally understand clustering of bike docks.

  • Regression testing results to analyze the relationship between dock locations, population density, and public transit options.

  • Monte Carlo simulations to compare actual results to randomized data.

Exploratory Data Analysis

Prior to performing any hypothesis tests, we will start by plotting all bikeshare docks from each metropolitan area, along with a reasonable geographic region to surround them. The chosen methodology for selecting a region is to take a concave hull using spdep in R, with 0.2 as the argument for this function. This was chosen so that each metropolitan area’s docks would fit in a contiguous region that accurately represents the extent of the system’s reach, while limiting the amount of uninhabitable space that would be captured if we used a convex hull. Padding was added proportionally to the overall extent of each bikeshare system, simply to avoid docks being directly on the edge of our regions.

The plots below visualize bikeshare docks, overlaid over these chosen regions for our five cities of interest.

Plotting the Docks

Code
dc_bb <- c(xmin = -77.45, ymin = 38.70, xmax = -76.80, ymax = 39.10)
docks_raw_dc <- opq(bbox = dc_bb) |>
  add_osm_feature(key = "amenity", value = "bicycle_rental") |>
  add_osm_feature(key = "network", value = "Capital Bikeshare") |>
  osmdata_sf()

docks_dc <- docks_raw_dc$osm_points |>
  st_as_sf() |>
  st_transform(3857) %>%
  filter(!is.na(network))
Code
hull_dc <- docks_dc |> 
    st_union() |> 
    st_concave_hull(.2)
hull_buffered_dc <- st_buffer(hull_dc, dist = 2500)
hull_buffered_wgs84_dc <- st_transform(hull_buffered_dc, 4326)
docks_dc <- st_intersection(
    st_transform(docks_dc, 4326), 
    hull_buffered_wgs84_dc)
Code
mapview(hull_buffered_wgs84_dc,
        col.regions = "lightblue",
        alpha.regions = 0.3) +
mapview(docks_dc, col.regions = "red",
  layer.name = "Bikeshare Docks")
Code
nyc_bb <- c(xmin = -74.15, ymin = 40.55, xmax = -73.70, ymax = 40.95)
docks_raw_nyc <- opq(bbox = nyc_bb) |>
  add_osm_feature(key = "amenity", value = "bicycle_rental") |>
  add_osm_feature(key = "network", value = "Citi Bike") |>
  osmdata_sf()

docks_nyc <- docks_raw_nyc$osm_points |>
  st_as_sf() |>
  st_transform(3857) %>%
  filter(!is.na(network))
Code
hull_nyc <- docks_nyc |> 
  st_union() |> 
  st_concave_hull(.2)
hull_buffered_nyc <- st_buffer(hull_nyc, dist = 500)
hull_buffered_wgs84_nyc <- st_transform(hull_buffered_nyc, 4326)
docks_nyc <- st_intersection(
  st_transform(docks_nyc, 4326),
  hull_buffered_wgs84_nyc
)
Code
mapview(hull_buffered_wgs84_nyc, col.regions = "lightblue", alpha.regions = 0.3) +
  mapview(docks_nyc, col.regions = "red",
  layer.name = "Bikeshare Docks")
Code
bos_bb <- c(-71.20, 42.20, -70.90, 42.45)
docks_raw_bos <- opq(bbox = bos_bb) |>
  add_osm_feature(key = "amenity", value = "bicycle_rental") |>
  add_osm_feature(key = "network", value = "Bluebikes") |>
  osmdata_sf()

docks_bos <- docks_raw_bos$osm_points |>
  st_as_sf() |>
  st_transform(3857) %>%
  filter(!is.na(network))
Code
hull_bos <- docks_bos |> 
  st_union() |> 
  st_concave_hull(.2)
hull_buffered_bos <- st_buffer(hull_bos, dist = 1000)
hull_buffered_wgs84_bos <- st_transform(hull_buffered_bos, 4326)
docks_bos <- st_intersection(
  st_transform(docks_bos, 4326),
  hull_buffered_wgs84_bos
)
Code
mapview(hull_buffered_wgs84_bos, col.regions = "lightblue", alpha.regions = 0.3) +
  mapview(docks_bos, col.regions = "red",
  layer.name = "Bikeshare Docks")
Code
sf_bb <- c(-122.55, 37.70, -122.35, 37.85)
docks_raw_sf <- opq(bbox = sf_bb) |>
  add_osm_feature(key = "amenity", value = "bicycle_rental") |>
  add_osm_feature(key = "network", value = "Bay Wheels") |>
  osmdata_sf()

docks_sf <- docks_raw_sf$osm_points |>
  st_as_sf() |>
  st_transform(3857) %>%
  filter(!is.na(network))
Code
hull_sf <- docks_sf |> 
  st_union() |> 
  st_concave_hull(.2)
hull_buffered_sf <- st_buffer(hull_sf, dist = 1000)
hull_buffered_wgs84_sf <- st_transform(hull_buffered_sf, 4326)
docks_sf <- st_intersection(
  st_transform(docks_sf, 4326),
  hull_buffered_wgs84_sf
)
Code
mapview(hull_buffered_wgs84_sf, col.regions = "lightblue", alpha.regions = 0.3) +
  mapview(docks_sf, col.regions = "red",
  layer.name = "Bikeshare Docks")
Code
chi_bb <- c(-87.80, 41.75, -87.55, 42.00)
docks_raw_chi <- opq(bbox = chi_bb) |>
  add_osm_feature(key = "amenity", value = "bicycle_rental") |>
  add_osm_feature(key = "network", value = "Divvy") |>
  osmdata_sf()

docks_chi <- docks_raw_chi$osm_points |>
  st_as_sf() |>
  st_transform(3857) %>%
  filter(!is.na(network))
Code
hull_chi <- docks_chi |> 
  st_union() |> 
  st_concave_hull(.2)
hull_buffered_chi <- st_buffer(hull_chi, dist = 2500)
hull_buffered_wgs84_chi <- st_transform(hull_buffered_chi, 4326)
docks_chi <- st_intersection(
  st_transform(docks_chi, 4326),
  hull_buffered_wgs84_chi
)
Code
mapview(hull_buffered_wgs84_chi, col.regions = "lightblue", alpha.regions = 0.3) +
  mapview(docks_chi, col.regions = "red",
  layer.name = "Bikeshare Docks")

The pattern of docks without any additional data already show us some differences in layout. New York and Chicago appear to have the most exhaustive bikeshare systems, with docks spanning nearly the entire geographic region with relative frequency everywhere. In Boston and Washington, docks seem to be more strategically placed with clusters extending away from the downtown area in various directions. Meanwhile, San Francisco has more sporadic placement with clustering only appearing in the downtown area.

While these maps all suggest divergent strategies for placing bikeshare docks, the next section will help pinpoint factors that may influence these decisions. Using Census data from tidycensus and train station locations from OpenStreetMap via the osm package, we can take a look at two variables that urban planners would likely be interested in, according to our literature review.

Plotting Other Data

Code
#census_api_key("5e8593d6e289e6ae04c33a019f4951ca2bdd9523", install = TRUE)
dc_bg <- get_acs(
  geography = "block group",
  variables = "B01003_001E",
  year = 2022,
  survey = "acs5",
  state = c("DC", "MD", "VA"),
  geometry = TRUE
)
Code
dc_bg <- st_transform(dc_bg, st_crs(hull_buffered_wgs84_dc))

dc_bg <- dc_bg %>%
  mutate(area_original = as.numeric(st_area(geometry)))

dc_bg_clipped <- st_intersection(dc_bg, hull_buffered_wgs84_dc)
Code
dc_bg_clipped <- dc_bg_clipped %>%
  mutate(
    area_clipped = as.numeric(st_area(geometry)),
    pop_adjusted = estimate * (area_clipped / area_original),
    area_km2 = area_clipped / 1e6,
    pop_density = pop_adjusted / area_km2
  )

dc_bg_density <- dc_bg_clipped %>%
  mutate(
    area_km2 = as.numeric(st_area(geometry)) / 1e6,
    pop_density = estimate / area_km2
  )
Code
max_density <- 20000
dc_bg_density$pop_density_plot <- pmin(dc_bg_density$pop_density, max_density)
Code
dc_bbox <- st_bbox(hull_buffered_wgs84_dc)

# bus_stops <- opq(bbox) %>%
#   add_osm_feature(key = "highway", value = "bus_stop") %>%
#   osmdata_sf() %>%
#   .$osm_points %>%
#   st_transform(st_crs(hull_buffered_wgs84_dc))

stations_dc <- opq(dc_bbox) %>%
  add_osm_feature(key = "railway", value = "station") %>%
  osmdata_sf() %>%
  .$osm_points %>%
  st_transform(st_crs(hull_buffered_wgs84_dc))
Code
pal <- leaflet::colorNumeric(viridis(20), domain = dc_bg_density$pop_density_plot)
leaflet() %>%
  leaflet::addProviderTiles("CartoDB.Positron") %>%
  
  # Population density polygons
  leaflet::addPolygons(
    data = dc_bg_density,
    fillColor = ~pal(pop_density_plot),
    fillOpacity = 0.6,
    color = "grey",
    weight = 0.5,
    group = "Population Density",
    popup = ~paste0("Population density: ", round(pop_density_plot))
  ) %>%
  
  # Dock locations
  leaflet::addCircleMarkers(
    data = docks_dc,
    radius = 1,
    color = "blue",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Bikeshare Docks"
  ) %>%
  
  # Train stations
  leaflet::addCircleMarkers(
    data = stations_dc,
    radius = 1,
    color = "red",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Train Stations"
  ) %>%
  
  # Layer controls
  leaflet::addLayersControl(
    overlayGroups = c("Population Density", "Bikeshare Docks", "Train Stations"),
    options = leaflet::layersControlOptions(collapsed = FALSE)
  )
Code
nyc_bg <- get_acs(
  geography = "block group",
  variables = "B01003_001E",
  year = 2022,
  survey = "acs5",
  state = c("NY", "NJ"),
  geometry = TRUE
)
Code
nyc_bg <- st_transform(nyc_bg, st_crs(hull_buffered_wgs84_nyc))

nyc_bg <- nyc_bg %>%
  mutate(area_original = as.numeric(st_area(geometry)))

nyc_bg_clipped <- st_intersection(nyc_bg, hull_buffered_wgs84_nyc)
Code
nyc_bg_clipped <- nyc_bg_clipped %>%
  mutate(
    area_clipped = as.numeric(st_area(geometry)),
    pop_adjusted = estimate * (area_clipped / area_original),
    area_km2 = area_clipped / 1e6,
    pop_density = pop_adjusted / area_km2
  )

nyc_bg_density <- nyc_bg_clipped %>%
  mutate(
    area_km2 = as.numeric(st_area(geometry)) / 1e6,
    pop_density = estimate / area_km2
  )
Code
max_density <- 75000
nyc_bg_density$pop_density_plot <- pmin(nyc_bg_density$pop_density, max_density)
Code
nyc_bbox <- st_bbox(hull_buffered_wgs84_nyc)

# bus_stops <- opq(bbox) %>%
#   add_osm_feature(key = "highway", value = "bus_stop") %>%
#   osmdata_sf() %>%
#   .$osm_points %>%
#   st_transform(st_crs(hull_buffered_wgs84_dc))

stations_nyc <- opq(nyc_bbox) %>%
  add_osm_feature(key = "railway", value = "station") %>%
  osmdata_sf() %>%
  .$osm_points %>%
  st_transform(st_crs(hull_buffered_wgs84_nyc))
Code
pal <- leaflet::colorNumeric(viridis(20), domain = nyc_bg_density$pop_density_plot)
leaflet() %>%
  leaflet::addProviderTiles("CartoDB.Positron") %>%
  
  # Population density polygons
  leaflet::addPolygons(
    data = nyc_bg_density,
    fillColor = ~pal(pop_density_plot),
    fillOpacity = 0.6,
    color = "grey",
    weight = 0.5,
    group = "Population Density",
    popup = ~paste0("Population density: ", round(pop_density_plot))
  ) %>%
  
  # Dock locations
  leaflet::addCircleMarkers(
    data = docks_nyc,
    radius = 1,
    color = "blue",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Bikeshare Docks"
  ) %>%
  
  # Train stations
  leaflet::addCircleMarkers(
    data = stations_nyc,
    radius = 1,
    color = "red",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Train Stations"
  ) %>%
  
  # Layer controls
  leaflet::addLayersControl(
    overlayGroups = c("Population Density", "Bikeshare Docks", "Train Stations"),
    options = leaflet::layersControlOptions(collapsed = FALSE)
  )
Code
bos_bg <- get_acs(
  geography = "block group",
  variables = "B01003_001E",
  year = 2022,
  survey = "acs5",
  state = c("MA"),
  geometry = TRUE
)
Code
bos_bg <- st_transform(bos_bg, st_crs(hull_buffered_wgs84_bos))

bos_bg <- bos_bg %>%
  mutate(area_original = as.numeric(st_area(geometry)))

bos_bg_clipped <- st_intersection(bos_bg, hull_buffered_wgs84_bos)
Code
bos_bg_clipped <- bos_bg_clipped %>%
  mutate(
    area_clipped = as.numeric(st_area(geometry)),
    pop_adjusted = estimate * (area_clipped / area_original),
    area_km2 = area_clipped / 1e6,
    pop_density = pop_adjusted / area_km2
  )

bos_bg_density <- bos_bg_clipped %>%
  mutate(
    area_km2 = as.numeric(st_area(geometry)) / 1e6,
    pop_density = estimate / area_km2
  )
Code
max_density <- 25000
bos_bg_density$pop_density_plot <- pmin(bos_bg_density$pop_density, max_density)
Code
bos_bbox <- st_bbox(hull_buffered_wgs84_bos)

# bus_stops <- opq(bbox) %>%
#   add_osm_feature(key = "highway", value = "bus_stop") %>%
#   osmdata_sf() %>%
#   .$osm_points %>%
#   st_transform(st_crs(hull_buffered_wgs84_dc))

stations_bos <- opq(bos_bbox) %>%
  add_osm_feature(key = "railway", value = "station") %>%
  osmdata_sf() %>%
  .$osm_points %>%
  st_transform(st_crs(hull_buffered_wgs84_bos))
Code
pal <- leaflet::colorNumeric(viridis(20), domain = bos_bg_density$pop_density_plot)
leaflet() %>%
  leaflet::addProviderTiles("CartoDB.Positron") %>%
  
  # Population density polygons
  leaflet::addPolygons(
    data = bos_bg_density,
    fillColor = ~pal(pop_density_plot),
    fillOpacity = 0.6,
    color = "grey",
    weight = 0.5,
    group = "Population Density",
    popup = ~paste0("Population density: ", round(pop_density_plot))
  ) %>%
  
  # Dock locations
  leaflet::addCircleMarkers(
    data = docks_bos,
    radius = 1,
    color = "blue",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Bikeshare Docks"
  ) %>%
  
  # Train stations
  leaflet::addCircleMarkers(
    data = stations_bos,
    radius = 1,
    color = "red",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Train Stations"
  ) %>%
  
  # Layer controls
  leaflet::addLayersControl(
    overlayGroups = c("Population Density", "Bikeshare Docks", "Train Stations"),
    options = leaflet::layersControlOptions(collapsed = FALSE)
  )
Code
sf_bg <- get_acs(
  geography = "block group",
  variables = "B01003_001E",
  year = 2022,
  survey = "acs5",
  state = c("CA"),
  geometry = TRUE
)
Code
sf_bg <- st_transform(sf_bg, st_crs(hull_buffered_wgs84_sf))

sf_bg <- sf_bg %>%
  mutate(area_original = as.numeric(st_area(geometry)))

sf_bg_clipped <- st_intersection(sf_bg, hull_buffered_wgs84_sf)
Code
sf_bg_clipped <- sf_bg_clipped %>%
  mutate(
    area_clipped = as.numeric(st_area(geometry)),
    pop_adjusted = estimate * (area_clipped / area_original),
    area_km2 = area_clipped / 1e6,
    pop_density = pop_adjusted / area_km2
  )

sf_bg_density <- sf_bg_clipped %>%
  mutate(
    area_km2 = as.numeric(st_area(geometry)) / 1e6,
    pop_density = estimate / area_km2
  )
Code
max_density <- 35000
sf_bg_density$pop_density_plot <- pmin(sf_bg_density$pop_density, max_density)
Code
sf_bbox <- st_bbox(hull_buffered_wgs84_sf)

# bus_stops <- opq(bbox) %>%
#   add_osm_feature(key = "highway", value = "bus_stop") %>%
#   osmdata_sf() %>%
#   .$osm_points %>%
#   st_transform(st_crs(hull_buffered_wgs84_dc))

stations_sf <- opq(sf_bbox) %>%
  add_osm_feature(key = "railway", value = "station") %>%
  osmdata_sf() %>%
  .$osm_points %>%
  st_transform(st_crs(hull_buffered_wgs84_sf))
Code
pal <- leaflet::colorNumeric(viridis(20), domain = sf_bg_density$pop_density_plot)
leaflet() %>%
  leaflet::addProviderTiles("CartoDB.Positron") %>%
  
  # Population density polygons
  leaflet::addPolygons(
    data = sf_bg_density,
    fillColor = ~pal(pop_density_plot),
    fillOpacity = 0.6,
    color = "grey",
    weight = 0.5,
    group = "Population Density",
    popup = ~paste0("Population density: ", round(pop_density_plot))
  ) %>%
  
  # Dock locations
  leaflet::addCircleMarkers(
    data = docks_sf,
    radius = 1,
    color = "blue",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Bikeshare Docks"
  ) %>%
  
  # Train stations
  leaflet::addCircleMarkers(
    data = stations_sf,
    radius = 1,
    color = "red",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Train Stations"
  ) %>%
  
  # Layer controls
  leaflet::addLayersControl(
    overlayGroups = c("Population Density", "Bikeshare Docks", "Train Stations"),
    options = leaflet::layersControlOptions(collapsed = FALSE)
  )
Code
chi_bg <- get_acs(
  geography = "block group",
  variables = "B01003_001E",
  year = 2022,
  survey = "acs5",
  state = c("IL"),
  geometry = TRUE
)
Code
chi_bg <- st_transform(chi_bg, st_crs(hull_buffered_wgs84_chi))

chi_bg <- chi_bg %>%
  mutate(area_original = as.numeric(st_area(geometry)))

chi_bg_clipped <- st_intersection(chi_bg, hull_buffered_wgs84_chi)
Code
chi_bg_clipped <- chi_bg_clipped %>%
  mutate(
    area_clipped = as.numeric(st_area(geometry)),
    pop_adjusted = estimate * (area_clipped / area_original),
    area_km2 = area_clipped / 1e6,
    pop_density = pop_adjusted / area_km2
  )

chi_bg_density <- chi_bg_clipped %>%
  mutate(
    area_km2 = as.numeric(st_area(geometry)) / 1e6,
    pop_density = estimate / area_km2
  )
Code
max_density <- 30000
chi_bg_density$pop_density_plot <- pmin(chi_bg_density$pop_density, max_density)
Code
chi_bbox <- st_bbox(hull_buffered_wgs84_chi)

# bus_stops <- opq(bbox) %>%
#   add_osm_feature(key = "highway", value = "bus_stop") %>%
#   osmdata_sf() %>%
#   .$osm_points %>%
#   st_transform(st_crs(hull_buffered_wgs84_dc))

stations_chi <- opq(chi_bbox) %>%
  add_osm_feature(key = "railway", value = "station") %>%
  osmdata_sf() %>%
  .$osm_points %>%
  st_transform(st_crs(hull_buffered_wgs84_chi))
Code
pal <- leaflet::colorNumeric(viridis(20), domain = chi_bg_density$pop_density_plot)
leaflet() %>%
  leaflet::addProviderTiles("CartoDB.Positron") %>%
  
  # Population density polygons
  leaflet::addPolygons(
    data = chi_bg_density,
    fillColor = ~pal(pop_density_plot),
    fillOpacity = 0.6,
    color = "grey",
    weight = 0.5,
    group = "Population Density",
    popup = ~paste0("Population density: ", round(pop_density_plot))
  ) %>%
  
  # Dock locations
  leaflet::addCircleMarkers(
    data = docks_chi,
    radius = 1,
    color = "blue",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Bikeshare Docks"
  ) %>%
  
  # Train stations
  leaflet::addCircleMarkers(
    data = stations_chi,
    radius = 1,
    color = "red",
    fill = TRUE,
    fillOpacity = 0.5,
    group = "Train Stations"
  ) %>%
  
  # Layer controls
  leaflet::addLayersControl(
    overlayGroups = c("Population Density", "Bikeshare Docks", "Train Stations"),
    options = leaflet::layersControlOptions(collapsed = FALSE)
  )

Adding train stations and population density to these maps starts provide some strategic insights as to where docks tend to cluster. Starting with Washington, D.C., the most heavily clustered areas reliably line up with dense populations that are often served by multiple train stations in close proximity. Meanwhile, the clusters of docks that stem out from the center of the city seem to line up nearly perfectly with train stations following the same pattern, indicating intentional design to follow existing public transit infrastructure.

As mentioned in the previous section, New York and Chicago have similarly exhaustive coverage over their city, both with a higher density of docks in central areas. However, Chicago’s system seems to increase dock density exclusively in population-dense areas. New York, on the other hand, has its greatest dock density in lower Manhattan, even in neighborhoods like midtown which has low population density. In contrast with the relatively sparse coverage in high-population upper Manhattan neighborhoods, we can surmise that this system was partially designed with commercial neighborhoods taking precedent over residential ones. In both cases, public transit does not jump out as having an effect on dock locations.

Boston and San Francisco seem to diverge the most from population density, as Boston has a few dense neighborhoods that lack bikeshare coverage, and San Francisco’s most dense area is curiously surrounded by bikeshare docks but doesn’t contain them. Additionally, the docks seem to cover entirely different parts of the city than train stations in San Francisco, while no apparent pattern jumps out in Boston.

Univariate Testing

Prior to hypothesis testing, another valuable form of exploratory data analysis can be to simply quantify differences in bikeshare dock deployment via clustering. This can be accomplished by calculating Moran’s I, a statistic that measures spatial autocorrelation within a region to indicate if nodes are clustered, disperse, or somewhere in between. The following section will output Moran’s I values for all five cities, along with a leaflet plot for each one that highlights areas of greater clustering than average.

Code
coords_dc <- st_coordinates(docks_dc)

knn_dc <- knearneigh(coords_dc, k = 4)
nb_dc <- knn2nb(knn_dc)
Code
lw_dc <- nb2listw(nb_dc, style = "W")

inv_dist_dc <- sapply(1:nrow(coords_dc), function(i) {
  neighbors_dc <- nb_dc[[i]]
  if(length(neighbors_dc) == 0) return(0)
  1 / mean(sqrt((coords_dc[i,1]-coords_dc[neighbors_dc,1])^2 + (coords_dc[i,2]-coords_dc[neighbors_dc,2])^2))
})

moran_result_dc <- moran.test(inv_dist_dc, lw_dc, zero.policy = TRUE)
moran_result_dc

    Moran I test under randomisation

data:  inv_dist_dc  
weights: lw_dc    

Moran I statistic standard deviate = 35.735, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.8418107214     -0.0012836970      0.0005566251 
Code
dock_values_dc <- inv_dist_dc

localI_dc <- localmoran(dock_values_dc, lw_dc, zero.policy = TRUE)
docks_dc$localI <- localI_dc[,1]

leaflet(docks_dc) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    radius = 4,
    color = ~viridis::viridis(20)[cut(localI, breaks = 20)],
    fillOpacity = 0.8,
    popup = ~paste0("Local Moran's I: ", round(localI, 3))
  )
Code
coords_nyc <- st_coordinates(docks_nyc)

knn_nyc <- knearneigh(coords_nyc, k = 4)
nb_nyc <- knn2nb(knn_nyc)
Code
lw_nyc <- nb2listw(nb_nyc, style = "W")

inv_dist_nyc <- sapply(1:nrow(coords_nyc), function(i) {
  neighbors_nyc <- nb_nyc[[i]]
  if(length(neighbors_nyc) == 0) return(0)
  1 / mean(sqrt((coords_nyc[i,1]-coords_nyc[neighbors_nyc,1])^2 + (coords_nyc[i,2]-coords_nyc[neighbors_nyc,2])^2))
})

moran_result_nyc <- moran.test(inv_dist_nyc, lw_nyc, zero.policy = TRUE)
moran_result_nyc

    Moran I test under randomisation

data:  inv_dist_nyc  
weights: lw_nyc    

Moran I statistic standard deviate = 48.697, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.6827796248     -0.0004344049      0.0001968368 
Code
dock_values_nyc <- inv_dist_nyc

localI_nyc <- localmoran(dock_values_nyc, lw_nyc, zero.policy = TRUE)
docks_nyc$localI <- localI_nyc[,1]

leaflet(docks_nyc) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    radius = 4,
    color = ~viridis::viridis(20)[cut(localI, breaks = 20)],
    fillOpacity = 0.8,
    popup = ~paste0("Local Moran's I: ", round(localI, 3))
  )
Code
coords_bos <- st_coordinates(docks_bos)

knn_bos <- knearneigh(coords_bos, k = 4)
nb_bos <- knn2nb(knn_bos)

lw_bos <- nb2listw(nb_bos, style = "W")

inv_dist_bos <- sapply(1:nrow(coords_bos), function(i) {
  neighbors_bos <- nb_bos[[i]]
  if(length(neighbors_bos) == 0) return(0)
  1 / mean(sqrt((coords_bos[i,1]-coords_bos[neighbors_bos,1])^2 + (coords_bos[i,2]-coords_bos[neighbors_bos,2])^2))
})

moran_result_bos <- moran.test(inv_dist_bos, lw_bos, zero.policy = TRUE)
moran_result_bos

    Moran I test under randomisation

data:  inv_dist_bos  
weights: lw_bos    

Moran I statistic standard deviate = 23.091, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.842640558      -0.003134796       0.001341649 
Code
dock_values_bos <- inv_dist_bos

localI_bos <- localmoran(dock_values_bos, lw_bos, zero.policy = TRUE)
docks_bos$localI <- localI_bos[,1]

leaflet(docks_bos) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    radius = 4,
    color = ~viridis::viridis(20)[cut(localI, breaks = 20)],
    fillOpacity = 0.8,
    popup = ~paste0("Local Moran's I: ", round(localI, 3))
  )
Code
coords_sf <- st_coordinates(docks_sf)

knn_sf <- knearneigh(coords_sf, k = 4)
nb_sf <- knn2nb(knn_sf)

lw_sf <- nb2listw(nb_sf, style = "W")

inv_dist_sf <- sapply(1:nrow(coords_sf), function(i) {
  neighbors_sf <- nb_sf[[i]]
  if(length(neighbors_sf) == 0) return(0)
  1 / mean(sqrt((coords_sf[i,1]-coords_sf[neighbors_sf,1])^2 + (coords_sf[i,2]-coords_sf[neighbors_sf,2])^2))
})

moran_result_sf <- moran.test(inv_dist_sf, lw_sf, zero.policy = TRUE)
moran_result_sf

    Moran I test under randomisation

data:  inv_dist_sf  
weights: lw_sf    

Moran I statistic standard deviate = 12.936, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.808442476      -0.009433962       0.003997522 
Code
dock_values_sf <- inv_dist_sf

localI_sf <- localmoran(dock_values_sf, lw_sf, zero.policy = TRUE)
docks_sf$localI <- localI_sf[,1]

leaflet(docks_sf) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    radius = 4,
    color = ~viridis::viridis(20)[cut(localI, breaks = 20)],
    fillOpacity = 0.8,
    popup = ~paste0("Local Moran's I: ", round(localI, 3))
  )
Code
coords_chi <- st_coordinates(docks_chi)

knn_chi <- knearneigh(coords_chi, k = 4)
nb_chi <- knn2nb(knn_chi)

lw_chi <- nb2listw(nb_chi, style = "W")

inv_dist_chi <- sapply(1:nrow(coords_chi), function(i) {
  neighbors_chi <- nb_chi[[i]]
  if(length(neighbors_chi) == 0) return(0)
  1 / mean(sqrt((coords_chi[i,1]-coords_chi[neighbors_chi,1])^2 + (coords_chi[i,2]-coords_chi[neighbors_chi,2])^2))
})

moran_result_chi <- moran.test(inv_dist_chi, lw_chi, zero.policy = TRUE)
moran_result_chi

    Moran I test under randomisation

data:  inv_dist_chi  
weights: lw_chi    

Moran I statistic standard deviate = 43.439, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.9570403158     -0.0011061947      0.0004865145 
Code
dock_values_chi <- inv_dist_chi

localI_chi <- localmoran(dock_values_chi, lw_chi, zero.policy = TRUE)
docks_chi$localI <- localI_chi[,1]

leaflet(docks_chi) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    radius = 4,
    color = ~viridis::viridis(20)[cut(localI, breaks = 20)],
    fillOpacity = 0.8,
    popup = ~paste0("Local Moran's I: ", round(localI, 3))
  )

Univariate Results

Metropolitan Area Moran’s I Statistic
Washington, DC \(0.8418107214\)
New York, NY \(0.6827748494\)
Boston, MA \(0.842640558\)
San Francisco, CA \(0.808442476\)
Chicago, IL \(0.9584339511\)

These maps and statistics mostly confirm some of the observations from the EDA section. The results to note are the differences outlined between New York and Chicago. Chicago’s system, while spread out over the city, clearly is characterized by intense clustering in the downtown, such that it produces the highest Moran’s I of the bunch. Meanwhile, New York’s Moran’s I tells us that most of the docks are intentionally spaced out more than in other cities, providing coverage to more neighborhoods and less favoritism towards certain regions.

The other observation from this table is that all of the Moran’s I statistics are relatively high, confirming a common-sense assumption that bikeshare docks tend to be placed in the presence of other bikeshare docks.

Hypothesis Test: Population Density

Our first hypothesis tests will pertain to population density. As a reminder the null and alternative hypotheses are:

  • Null hypothesis: There is no relationship between the spatial distribution of bikeshare docks and population density. Dock locations occur independently of areas with higher or lower population density.
  • Alternative hypothesis: There is a relationship between the spatial distribution of Capital Bikeshare docks and population density. Dock locations are more likely to occur in areas with higher population density.

To test these, we will first look at correlation functions to quantify the relationship between dock locations and population density. Specifically, we will be using a likelihood ratio test within the framework of a Poisson Point Process to get both the statistical significance of the population density coefficient and a likelihood ratio that compares a model that relies on population density to a null model that is spatially homogeneous. The following section contains the process for executing these tests, with a table displaying relevant results at the end.

Correlation Functions

Code
proj_crs <- "EPSG:32618"
docks_proj_dc <- st_transform(docks_dc, 32618)
boundary_proj_dc <- st_transform(hull_buffered_wgs84_dc, 32618)
boundary_owin_dc <- as.owin(boundary_proj_dc)

dock_coords_dc <- st_coordinates(docks_proj_dc)

dock_ppp_dc <- ppp(
  x = dock_coords_dc[,1],
  y = dock_coords_dc[,2],
  window = boundary_owin_dc
)
dc_bg_density_proj <- st_transform(dc_bg_density, proj_crs)
rtemplate_dc <- rast(
  extent = ext(dc_bg_density_proj),
  resolution = 250,
  crs = "EPSG:32618"
)

pop_rast_dc <- rasterize(
  vect(dc_bg_density_proj),
  rtemplate_dc,
  field = "pop_density_plot",
  fun = "mean"
)

pop_rast_dc <- mask(pop_rast_dc, vect(boundary_proj_dc))
Code
nx_dc <- ncol(pop_rast_dc)
ny_dc <- nrow(pop_rast_dc)
r_ext_dc <- ext(pop_rast_dc)
res_x_dc <- res(pop_rast_dc)[1]
res_y_dc <- res(pop_rast_dc)[2]

xcol_dc <- seq(r_ext_dc$xmin + res_x_dc/2, r_ext_dc$xmax - res_x_dc/2, length.out = nx_dc)
yrow_dc <- seq(r_ext_dc$ymin + res_y_dc/2, r_ext_dc$ymax - res_y_dc/2, length.out = ny_dc)

vals_dc <- values(pop_rast_dc, mat = FALSE)
mat_dc <- matrix(vals_dc, nrow = ny_dc, ncol = nx_dc, byrow = TRUE)

mat_oriented_dc <- mat_dc[ny_dc:1, , drop = FALSE]
pop_im_dc <- im(mat_oriented_dc, xcol_dc, yrow_dc)
Code
m0_dc <- ppm(dock_ppp_dc ~ 1)
m1_dc <- ppm(dock_ppp_dc ~ pop_im_dc)
summary(m1_dc)
Point process model
Fitted to data: dock_ppp_dc
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_dc ~ pop_im_dc)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  780 points
Average intensity 8.39e-07 points per square unit
Window: polygonal boundary
single connected closed polygon with 787 vertices
enclosing rectangle: [290970.7, 343621.3] x [4291156, 4332019] units
                     (52650 x 40860 units)
Window area = 929241000 square units
Fraction of frame area: 0.432

Dummy quadrature points:
     64 x 64 grid of dummy points, plus 4 corner points
     dummy spacing: 822.6657 x 638.4939 units

Original dummy parameters: =
Planar point pattern:  1879 points
Average intensity 2.02e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 787 vertices
enclosing rectangle: [290970.7, 343621.3] x [4291156, 4332019] units
                     (52650 x 40860 units)
Window area = 929241000 square units
Fraction of frame area: 0.432
Quadrature weights:
     (counting weights based on 64 x 64 array of rectangular tiles)
All weights:
    range: [47800, 525000]  total: 9.26e+08
Weights on data points:
    range: [47800, 263000]  total: 1.33e+08
Weights on dummy points:
    range: [47800, 525000]  total: 7.93e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~pop_im_dc
Model depends on external covariate 'pop_im_dc'
Covariates provided:
    pop_im_dc: im

Fitted trend coefficients:
  (Intercept)     pop_im_dc 
-1.457112e+01  1.530411e-04 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -1.457112e+01 4.944758e-02 -1.466804e+01 -1.447421e+01   ***
pop_im_dc    1.530411e-04 6.102866e-06  1.410797e-04  1.650025e-04   ***
                  Zval
(Intercept) -294.67814
pop_im_dc     25.07692

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)     pop_im_dc 
-1.457112e+01  1.530411e-04 

Fitted exp(theta):
 (Intercept)    pop_im_dc 
4.697240e-07 1.000153e+00 
Code
anova(m0_dc, m1_dc, test="LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~pop_im_dc      Poisson
  Npar Df Deviance  Pr(>Chi)    
1    1                          
2    2  1   423.41 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
docks_proj_nyc <- st_transform(docks_nyc, 32618)
boundary_proj_nyc <- st_transform(hull_buffered_wgs84_nyc, 32618)
boundary_owin_nyc <- as.owin(boundary_proj_nyc)

dock_coords_nyc <- st_coordinates(docks_proj_nyc)

dock_ppp_nyc <- ppp(
  x = dock_coords_nyc[,1],
  y = dock_coords_nyc[,2],
  window = boundary_owin_nyc
)
nyc_bg_density_proj <- st_transform(nyc_bg_density, crs = proj_crs)

rtemplate_nyc <- rast(
  extent = ext(nyc_bg_density_proj),
  resolution = 250,
  crs = "EPSG:32618"
)

pop_rast_nyc <- rasterize(
  vect(nyc_bg_density_proj),
  rtemplate_nyc,
  field = "pop_density_plot",
  fun = "mean"
)
pop_rast_nyc <- mask(pop_rast_nyc, vect(boundary_proj_nyc))
Code
nx_nyc <- ncol(pop_rast_nyc)
ny_nyc <- nrow(pop_rast_nyc)
r_ext_nyc <- ext(pop_rast_nyc)
res_x_nyc <- res(pop_rast_nyc)[1]
res_y_nyc <- res(pop_rast_nyc)[2]

xcol_nyc <- seq(r_ext_nyc$xmin + res_x_nyc/2, r_ext_nyc$xmax - res_x_nyc/2, length.out = nx_nyc)
yrow_nyc <- seq(r_ext_nyc$ymin + res_y_nyc/2, r_ext_nyc$ymax - res_y_nyc/2, length.out = ny_nyc)

vals_nyc <- values(pop_rast_nyc, mat = FALSE)
mat_nyc <- matrix(vals_nyc, nrow = ny_nyc, ncol = nx_nyc, byrow = TRUE)

mat_oriented_nyc <- mat_nyc[ny_nyc:1, , drop = FALSE]
pop_im_nyc <- im(mat_oriented_nyc, xcol_nyc, yrow_nyc)
Code
m0_nyc <- ppm(dock_ppp_nyc ~ 1)
m1_nyc <- ppm(dock_ppp_nyc ~ pop_im_nyc)
Code
summary(m1_nyc)
Point process model
Fitted to data: dock_ppp_nyc
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_nyc ~ pop_im_nyc)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  2303 points
Average intensity 7.85e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 372 vertices
enclosing rectangle: [576179.5, 597719.7] x [4495675, 4527093] units
                     (21540 x 31420 units)
Window area = 293255000 square units
Fraction of frame area: 0.433

Dummy quadrature points:
     100 x 100 grid of dummy points, plus 4 corner points
     dummy spacing: 215.4018 x 314.1755 units

Original dummy parameters: =
Planar point pattern:  4449 points
Average intensity 1.52e-05 points per square unit
Window: polygonal boundary
single connected closed polygon with 372 vertices
enclosing rectangle: [576179.5, 597719.7] x [4495675, 4527093] units
                     (21540 x 31420 units)
Window area = 293255000 square units
Fraction of frame area: 0.433
Quadrature weights:
     (counting weights based on 100 x 100 array of rectangular tiles)
All weights:
    range: [8790, 67700]    total: 2.93e+08
Weights on data points:
    range: [11300, 33800]   total: 69200000
Weights on dummy points:
    range: [8790, 67700]    total: 2.24e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~pop_im_nyc
Model depends on external covariate 'pop_im_nyc'
Covariates provided:
    pop_im_nyc: im

Fitted trend coefficients:
  (Intercept)    pop_im_nyc 
-1.207053e+01  1.589846e-05 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -1.207053e+01 3.142609e-02 -1.213213e+01 -1.200894e+01   ***
pop_im_nyc   1.589846e-05 9.835651e-07  1.397071e-05  1.782621e-05   ***
                  Zval
(Intercept) -384.09275
pop_im_nyc    16.16412

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)    pop_im_nyc 
-1.207053e+01  1.589846e-05 

Fitted exp(theta):
 (Intercept)   pop_im_nyc 
5.725780e-06 1.000016e+00 
Problem:
 Values of the covariate 'pop_im_nyc' were NA or undefined at 0.9% (61 out of 6752) of the quadrature points 
Code
anova(m0_nyc, m1_nyc, test="LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~pop_im_nyc     Poisson
  Npar Df Deviance  Pr(>Chi)    
1   62                          
2   63  1   235.86 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
docks_proj_bos <- st_transform(docks_bos, 32618)
boundary_proj_bos <- st_transform(hull_buffered_wgs84_bos, 32618)
boundary_owin_bos <- as.owin(boundary_proj_bos)

dock_coords_bos <- st_coordinates(docks_proj_bos)

dock_ppp_bos <- ppp(
  x = dock_coords_bos[,1],
  y = dock_coords_bos[,2],
  window = boundary_owin_bos
)
bos_bg_density_proj <- st_transform(bos_bg_density, crs = proj_crs)

rtemplate_bos <- rast(
  extent = ext(bos_bg_density_proj),
  resolution = 250,
  crs = "EPSG:32618"
)

pop_rast_bos <- rasterize(
  vect(bos_bg_density_proj),
  rtemplate_bos,
  field = "pop_density_plot",
  fun = "mean"
)
pop_rast_bos <- mask(pop_rast_bos, vect(boundary_proj_bos))
Code
nx_bos <- ncol(pop_rast_bos)
ny_bos <- nrow(pop_rast_bos)
r_ext_bos <- ext(pop_rast_bos)
res_x_bos <- res(pop_rast_bos)[1]
res_y_bos <- res(pop_rast_bos)[2]

xcol_bos <- seq(r_ext_bos$xmin + res_x_bos/2, r_ext_bos$xmax - res_x_bos/2, length.out = nx_bos)
yrow_bos <- seq(r_ext_bos$ymin + res_y_bos/2, r_ext_bos$ymax - res_y_bos/2, length.out = ny_bos)

vals_bos <- values(pop_rast_bos, mat = FALSE)
mat_bos <- matrix(vals_bos, nrow = ny_bos, ncol = nx_bos, byrow = TRUE)

mat_oriented_bos <- mat_bos[ny_bos:1, , drop = FALSE]
pop_im_bos <- im(mat_oriented_bos, xcol_bos, yrow_bos)
Code
m0_bos <- ppm(dock_ppp_bos ~ 1)
m1_bos <- ppm(dock_ppp_bos ~ pop_im_bos)
Code
summary(m1_bos)
Point process model
Fitted to data: dock_ppp_bos
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_bos ~ pop_im_bos)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  320 points
Average intensity 2.01e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 448 vertices
enclosing rectangle: [813201.2, 830538] x [4687280, 4704924] units
                     (17340 x 17640 units)
Window area = 158844000 square units
Fraction of frame area: 0.519

Dummy quadrature points:
     64 x 64 grid of dummy points, plus 4 corner points
     dummy spacing: 270.8883 x 275.6815 units

Original dummy parameters: =
Planar point pattern:  2206 points
Average intensity 1.39e-05 points per square unit
Window: polygonal boundary
single connected closed polygon with 448 vertices
enclosing rectangle: [813201.2, 830538] x [4687280, 4704924] units
                     (17340 x 17640 units)
Window area = 158844000 square units
Fraction of frame area: 0.519
Quadrature weights:
     (counting weights based on 64 x 64 array of rectangular tiles)
All weights:
    range: [9790, 74700]    total: 1.58e+08
Weights on data points:
    range: [18700, 37300]   total: 10900000
Weights on dummy points:
    range: [9790, 74700]    total: 1.48e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~pop_im_bos
Model depends on external covariate 'pop_im_bos'
Covariates provided:
    pop_im_bos: im

Fitted trend coefficients:
  (Intercept)    pop_im_bos 
-1.347470e+01  5.220753e-05 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -1.347470e+01 8.856857e-02 -1.364830e+01 -1.330111e+01   ***
pop_im_bos   5.220753e-05 8.263342e-06  3.601168e-05  6.840339e-05   ***
                   Zval
(Intercept) -152.138664
pop_im_bos     6.317968

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)    pop_im_bos 
-1.347470e+01  5.220753e-05 

Fitted exp(theta):
 (Intercept)   pop_im_bos 
1.406081e-06 1.000052e+00 
Problem:
 Values of the covariate 'pop_im_bos' were NA or undefined at 1.4% (36 out of 2526) of the quadrature points 
Code
anova(m0_bos, m1_bos, test="LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~pop_im_bos     Poisson
  Npar Df Deviance  Pr(>Chi)    
1   37                          
2   38  1   35.813 2.172e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
docks_proj_sf <- st_transform(docks_sf, 32618)
boundary_proj_sf <- st_transform(hull_buffered_wgs84_sf, 32618)
boundary_owin_sf <- as.owin(boundary_proj_sf)

dock_coords_sf <- st_coordinates(docks_proj_sf)

dock_ppp_sf <- ppp(
  x = dock_coords_sf[,1],
  y = dock_coords_sf[,2],
  window = boundary_owin_sf
)
sf_bg_density_proj <- st_transform(sf_bg_density, crs = proj_crs)

rtemplate_sf <- rast(
  extent = ext(sf_bg_density_proj),
  resolution = 250,
  crs = "EPSG:32618"
)

pop_rast_sf <- rasterize(
  vect(sf_bg_density_proj),
  rtemplate_sf,
  field = "pop_density_plot",
  fun = "mean"
)
pop_rast_sf <- mask(pop_rast_sf, vect(boundary_proj_sf))
Code
nx_sf <- ncol(pop_rast_sf)
ny_sf <- nrow(pop_rast_sf)
r_ext_sf <- ext(pop_rast_sf)
res_x_sf <- res(pop_rast_sf)[1]
res_y_sf <- res(pop_rast_sf)[2]

xcol_sf <- seq(r_ext_sf$xmin + res_x_sf/2, r_ext_sf$xmax - res_x_sf/2, length.out = nx_sf)
yrow_sf <- seq(r_ext_sf$ymin + res_y_sf/2, r_ext_sf$ymax - res_y_sf/2, length.out = ny_sf)

vals_sf <- values(pop_rast_sf, mat = FALSE)
mat_sf <- matrix(vals_sf, nrow = ny_sf, ncol = nx_sf, byrow = TRUE)

mat_oriented_sf <- mat_sf[ny_sf:1, , drop = FALSE]
pop_im_sf <- im(mat_oriented_sf, xcol_sf, yrow_sf)
Code
m0_sf <- ppm(dock_ppp_sf ~ 1)
m1_sf <- ppm(dock_ppp_sf ~ pop_im_sf)
Code
summary(m1_sf)
Point process model
Fitted to data: dock_ppp_sf
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_sf ~ pop_im_sf)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  107 points
Average intensity 8.79e-07 points per square unit
Window: polygonal boundary
single connected closed polygon with 521 vertices
enclosing rectangle: [-3757682, -3743953] x [5411973, 5426315] units
                     (13730 x 14340 units)
Window area = 121743000 square units
Fraction of frame area: 0.618

Dummy quadrature points:
     32 x 32 grid of dummy points, plus 4 corner points
     dummy spacing: 429.0211 x 448.1738 units

Original dummy parameters: =
Planar point pattern:  696 points
Average intensity 5.72e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 521 vertices
enclosing rectangle: [-3757682, -3743953] x [5411973, 5426315] units
                     (13730 x 14340 units)
Window area = 121743000 square units
Fraction of frame area: 0.618
Quadrature weights:
     (counting weights based on 32 x 32 array of rectangular tiles)
All weights:
    range: [9630, 192000]   total: 1.22e+08
Weights on data points:
    range: [48100, 96100]   total: 8910000
Weights on dummy points:
    range: [9630, 192000]   total: 1.13e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~pop_im_sf
Model depends on external covariate 'pop_im_sf'
Covariates provided:
    pop_im_sf: im

Fitted trend coefficients:
  (Intercept)     pop_im_sf 
-1.424833e+01  3.528686e-05 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -1.424833e+01 1.657817e-01 -1.457326e+01 -1.392340e+01   ***
pop_im_sf    3.528686e-05 1.276963e-05  1.025885e-05  6.031487e-05    **
                  Zval
(Intercept) -85.946323
pop_im_sf     2.763343

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)     pop_im_sf 
-1.424833e+01  3.528686e-05 

Fitted exp(theta):
 (Intercept)    pop_im_sf 
6.486774e-07 1.000035e+00 
Problem:
 Values of the covariate 'pop_im_sf' were NA or undefined at 3.9% (31 out of 803) of the quadrature points 
Code
anova(m0_sf, m1_sf, test="LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~pop_im_sf      Poisson
  Npar Df Deviance Pr(>Chi)   
1   32                        
2   33  1     6.96 0.008335 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
docks_proj_chi <- st_transform(docks_chi, 32618)
boundary_proj_chi <- st_transform(hull_buffered_wgs84_chi, 32618)
boundary_owin_chi <- as.owin(boundary_proj_chi)

dock_coords_chi <- st_coordinates(docks_proj_chi)

dock_ppp_chi <- ppp(
  x = dock_coords_chi[,1],
  y = dock_coords_chi[,2],
  window = boundary_owin_chi
)
chi_bg_density_proj <- st_transform(chi_bg_density, crs = proj_crs)

rtemplate_chi <- rast(
  extent = ext(chi_bg_density_proj),
  resolution = 250,
  crs = "EPSG:32618"
)

pop_rast_chi <- rasterize(
  vect(chi_bg_density_proj),
  rtemplate_chi,
  field = "pop_density_plot",
  fun = "mean"
)
pop_rast_chi <- mask(pop_rast_chi, vect(boundary_proj_chi))
Code
nx_chi <- ncol(pop_rast_chi)
ny_chi <- nrow(pop_rast_chi)
r_ext_chi <- ext(pop_rast_chi)
res_x_chi <- res(pop_rast_chi)[1]
res_y_chi <- res(pop_rast_chi)[2]

xcol_chi <- seq(r_ext_chi$xmin + res_x_chi/2, r_ext_chi$xmax - res_x_chi/2, length.out = nx_chi)
yrow_chi <- seq(r_ext_chi$ymin + res_y_chi/2, r_ext_chi$ymax - res_y_chi/2, length.out = ny_chi)

vals_chi <- values(pop_rast_chi, mat = FALSE)
mat_chi <- matrix(vals_chi, nrow = ny_chi, ncol = nx_chi, byrow = TRUE)

mat_oriented_chi <- mat_chi[ny_chi:1, , drop = FALSE]
pop_im_chi <- im(mat_oriented_chi, xcol_chi, yrow_chi)
Code
m0_chi <- ppm(dock_ppp_chi ~ 1)
m1_chi <- ppm(dock_ppp_chi ~ pop_im_chi)
Code
summary(m1_chi)
Point process model
Fitted to data: dock_ppp_chi
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_chi ~ pop_im_chi)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  905 points
Average intensity 1.09e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 212 vertices
enclosing rectangle: [-566382.1, -541020.8] x [4685608, 4737799] units
                     (25360 x 52190 units)
Window area = 833001000 square units
Fraction of frame area: 0.629

Dummy quadrature points:
     70 x 70 grid of dummy points, plus 4 corner points
     dummy spacing: 362.3043 x 745.5840 units

Original dummy parameters: =
Planar point pattern:  3156 points
Average intensity 3.79e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 212 vertices
enclosing rectangle: [-566382.1, -541020.8] x [4685608, 4737799] units
                     (25360 x 52190 units)
Window area = 833001000 square units
Fraction of frame area: 0.629
Quadrature weights:
     (counting weights based on 70 x 70 array of rectangular tiles)
All weights:
    range: [24600, 270000]  total: 8.32e+08
Weights on data points:
    range: [24600, 135000]  total: 1.07e+08
Weights on dummy points:
    range: [24600, 270000]  total: 7.25e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~pop_im_chi
Model depends on external covariate 'pop_im_chi'
Covariates provided:
    pop_im_chi: im

Fitted trend coefficients:
  (Intercept)    pop_im_chi 
-1.412922e+01  8.864401e-05 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -1.412922e+01 4.738210e-02 -1.422209e+01 -1.403635e+01   ***
pop_im_chi   8.864401e-05 4.824293e-06  7.918857e-05  9.809945e-05   ***
                  Zval
(Intercept) -298.19746
pop_im_chi    18.37451

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)    pop_im_chi 
-1.412922e+01  8.864401e-05 

Fitted exp(theta):
 (Intercept)   pop_im_chi 
7.307299e-07 1.000089e+00 
Problem:
 Values of the covariate 'pop_im_chi' were NA or undefined at 7% (286 out of 4061) of the quadrature points 
Code
anova(m0_chi, m1_chi, test="LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~pop_im_chi     Poisson
  Npar Df Deviance  Pr(>Chi)    
1  287                          
2  288  1   243.36 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation Function Results against Population Density

Metropolitan Area Docks pop_im coefficient Z-value Deviance diff
Washington, DC \(780\) \(1.530 \times 10^{-4}\) \(25.07\) \(423.41\)
New York, NY \(2,303\) \(1.590 \times 10^{-5}\) \(16.16\) \(235.86\)
Boston, MA \(320\) \(5.221 \times 10^{-5}\) \(6.32\) \(35.81\)
San Francisco, CA \(107\) \(3.529 \times 10^{-5}\) \(2.76\) \(6.96\)
Chicago, IL \(909\) \(8.919 \times 10^{-5}\) \(18.61\) \(248.94\)

This table offers us a good summation of how bikeshare dock locations correlate to population density. The first thing to note is that each test placed great importance on the population density coefficient, confirming that population density is positively correlated with docks in all five cities. However, the specific Z-values tell us that the extent of these correlations are not the same.

Going in descending order of Z-value, Washington’s system most closely follows population density, followed by Chicago and New York, while Boston and Chicago’s docks are relatively loosely tied to where people live.

Monte Carlo Simulations

In addition to our Poisson Point Process, performing Monte Carlo simulations can provide extra confirmation of our findings by comparing the real-world deployments of bikeshare docks with randomized placements based on controlled conditions. For testing this hypothesis, all we need to do is simulate randomized placements that are based on no outside bias and see if we get results that are more or less tethered to population density. This will be quantified by checking how the population density coefficients compare to the real value. The proportion of simulated coefficients that exceed the real coefficient will serve as our p-value. The following section executes 999 simulations for each city, with a table of results at the end.

Code
set.seed(202)
model_obs_dc <- ppm(dock_ppp_dc ~ pop_im_dc)
coef_obs_dc <- coef(model_obs_dc)["pop_im_dc"]

nsim <- 999
coef_sim_dc <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_dc <- runifpoint(n = dock_ppp_dc$n, win = boundary_owin_dc)
  sim_model_dc <- ppm(sim_ppp_dc ~ pop_im_dc)
  coef_sim_dc[i] <- coef(sim_model_dc)["pop_im_dc"]
}
Code
p_value_dc <- (sum(coef_sim_dc >= coef_obs_dc) + 1) / (nsim + 1)

hist(
  coef_sim_dc,
  main = "Washington, DC: Monte Carlo Distribution of Population Density Coefficients",
  xlab = "Simulated Population Density Coefficient",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(min(coef_sim_dc, coef_obs_dc), max(coef_sim_dc, coef_obs_dc))
)
abline(v = coef_obs_dc, col = "red", lwd = 3)

Code
p_value_dc <- (sum(coef_sim_dc >= coef_obs_dc) + 1) / (nsim + 1)
p_value_dc
[1] 0.001
Code
model_obs_nyc <- ppm(dock_ppp_nyc ~ pop_im_nyc)
Code
coef_obs_nyc <- coef(model_obs_nyc)["pop_im_nyc"]

nsim <- 999
coef_sim_nyc <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_nyc <- runifpoint(n = dock_ppp_nyc$n, win = boundary_owin_nyc)
  sim_model_nyc <- ppm(sim_ppp_nyc ~ pop_im_nyc)
  coef_sim_nyc[i] <- coef(sim_model_nyc)["pop_im_nyc"]
}
Code
p_value_nyc <- (sum(coef_sim_nyc >= coef_obs_nyc) + 1) / (nsim + 1)

hist(
  coef_sim_nyc,
  main = "New York, NY: Monte Carlo Distribution of Population Density Coefficients",
  xlab = "Simulated Population Density Coefficient",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(min(coef_sim_nyc, coef_obs_nyc), max(coef_sim_nyc, coef_obs_nyc))
)
abline(v = coef_obs_nyc, col = "red", lwd = 3)

Code
p_value_nyc <- (sum(coef_sim_nyc >= coef_obs_nyc) + 1) / (nsim + 1)
p_value_nyc
[1] 0.001
Code
model_obs_bos <- ppm(dock_ppp_bos ~ pop_im_bos)
Code
coef_obs_bos <- coef(model_obs_bos)["pop_im_bos"]

nsim <- 999
coef_sim_bos <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_bos <- runifpoint(n = dock_ppp_bos$n, win = boundary_owin_bos)
  sim_model_bos <- ppm(sim_ppp_bos ~ pop_im_bos)
  coef_sim_bos[i] <- coef(sim_model_bos)["pop_im_bos"]
}
Code
p_value_bos <- (sum(coef_sim_bos >= coef_obs_bos) + 1) / (nsim + 1)

hist(
  coef_sim_bos,
  main = "Boston, MA: Monte Carlo Distribution of Population Density Coefficients",
  xlab = "Simulated Population Density Coefficient",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(min(coef_sim_bos, coef_obs_bos), max(coef_sim_bos, coef_obs_bos))
)
abline(v = coef_obs_bos, col = "red", lwd = 3)

Code
p_value_bos <- (sum(coef_sim_bos >= coef_obs_bos) + 1) / (nsim + 1)
p_value_bos
[1] 0.001
Code
model_obs_sf <- ppm(dock_ppp_sf ~ pop_im_sf)
Code
coef_obs_sf <- coef(model_obs_sf)["pop_im_sf"]

nsim <- 999
coef_sim_sf <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_sf <- runifpoint(n = dock_ppp_sf$n, win = boundary_owin_sf)
  sim_model_sf <- ppm(sim_ppp_sf ~ pop_im_sf)
  coef_sim_sf[i] <- coef(sim_model_sf)["pop_im_sf"]
}
Code
p_value_sf <- (sum(coef_sim_sf >= coef_obs_sf) + 1) / (nsim + 1)

hist(
  coef_sim_sf,
  main = "San Francisco, CA: Monte Carlo Distribution of Population Density Coefficients",
  xlab = "Simulated Population Density Coefficient",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(min(coef_sim_sf, coef_obs_sf), max(coef_sim_sf, coef_obs_sf))
)
abline(v = coef_obs_sf, col = "red", lwd = 3)

Code
p_value_sf <- (sum(coef_sim_sf >= coef_obs_sf) + 1) / (nsim + 1)
p_value_sf
[1] 0.011
Code
model_obs_chi <- ppm(dock_ppp_chi ~ pop_im_chi)
Code
coef_obs_chi <- coef(model_obs_chi)["pop_im_chi"]

nsim <- 999
coef_sim_chi <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_chi <- runifpoint(n = dock_ppp_chi$n, win = boundary_owin_chi)
  sim_model_chi <- ppm(sim_ppp_chi ~ pop_im_chi)
  coef_sim_chi[i] <- coef(sim_model_chi)["pop_im_chi"]
}
Code
p_value_chi <- (sum(coef_sim_chi >= coef_obs_chi) + 1) / (nsim + 1)

hist(
  coef_sim_chi,
  main = "Chicago, IL: Monte Carlo Distribution of Population Density Coefficients",
  xlab = "Simulated Population Density Coefficient",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(min(coef_sim_chi, coef_obs_chi), max(coef_sim_chi, coef_obs_chi))
)
abline(v = coef_obs_chi, col = "red", lwd = 3)

Code
p_value_chi <- (sum(coef_sim_chi >= coef_obs_chi) + 1) / (nsim + 1)
p_value_chi
[1] 0.001

Monte Carlo Simulation Results against Population Density

Metropolitan Area p-value
Washington, DC \(0.001\)
New York, NY \(0.001\)
Boston, MA \(0.001\)
San Francisco, CA \(0.011\)
Chicago, IL \(0.001\)

This process, while it emphatically confirms our alternate hypothesis in almost all cases, proves less useful than deriving statistics via a Poisson Point Process. Looking at the histogram of population density coefficients, the real coefficient is comically far from all simulated results in Washington, New York, and Chicago. The interesting finding here is that San Francisco did produce some simulations with coefficients that exceeded the real value. This once again confirms that San Francisco’s bikeshare system is uniquely only loosely tied to population density.

Hypothesis Test: Train Stations

While testing bikeshare dock locations against population density revealed some interesting differences in deployment strategy, it is reasonable to presume that there are other factors that influence these decisions. As stated in the introduction and literature review, considering access to public transit can tell us a lot about the purpose of bikeshare: whether it targets people who lack other car-free transit options, serves as a complement to public transit, or does not consider this factor at all. Thus, we will be testing the following hypotheses:

  • Null hypothesis: There is no relationship between the spatial distribution of bikeshare docks and the presence of other public transit nodes.

  • Alternative hypothesis: There is a relationship between the spatial distribution of bikeshare docks and the presence of other public transit nodes. Dock locations are more likely to be placed in areas with existing public transit infrastructure.

The first method of testing these hypotheses will be similar to the correlation functions from the previous section. We will again use Poisson Point Processes to obtain the coefficient of the distance from each dock to its nearest train station, as well as the likelihood ratio to compare to a null model.

Correlation Functions

Code
stations_proj_dc <- st_transform(stations_dc, 32618)
docks_proj_dc <- st_transform(docks_dc, 32618)
boundary_proj_dc <- st_transform(hull_buffered_wgs84_dc, 32618)

boundary_owin_dc <- as.owin(boundary_proj_dc)
dock_coords_dc <- st_coordinates(docks_proj_dc)
dock_ppp_dc <- ppp(x = dock_coords_dc[,1],
                y = dock_coords_dc[,2],
                window = boundary_owin_dc)

station_coords_dc <- st_coordinates(stations_proj_dc)
station_ppp_dc <- ppp(x = station_coords_dc[,1],
                   y = station_coords_dc[,2],
                   window = boundary_owin_dc)
Code
dist_im_dc <- distmap(station_ppp_dc)
Code
m0_dc <- ppm(dock_ppp_dc ~ 1)
m1_dc <- ppm(dock_ppp_dc ~ dist_im_dc)

anova(m0_dc, m1_dc, test = "LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~dist_im_dc     Poisson
  Npar Df Deviance  Pr(>Chi)    
1    1                          
2    2  1   584.99 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
summary(m1_dc)
Point process model
Fitted to data: dock_ppp_dc
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_dc ~ dist_im_dc)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  780 points
Average intensity 8.39e-07 points per square unit
Window: polygonal boundary
single connected closed polygon with 787 vertices
enclosing rectangle: [290970.7, 343621.3] x [4291156, 4332019] units
                     (52650 x 40860 units)
Window area = 929241000 square units
Fraction of frame area: 0.432

Dummy quadrature points:
     64 x 64 grid of dummy points, plus 4 corner points
     dummy spacing: 822.6657 x 638.4939 units

Original dummy parameters: =
Planar point pattern:  1879 points
Average intensity 2.02e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 787 vertices
enclosing rectangle: [290970.7, 343621.3] x [4291156, 4332019] units
                     (52650 x 40860 units)
Window area = 929241000 square units
Fraction of frame area: 0.432
Quadrature weights:
     (counting weights based on 64 x 64 array of rectangular tiles)
All weights:
    range: [47800, 525000]  total: 9.26e+08
Weights on data points:
    range: [47800, 263000]  total: 1.33e+08
Weights on dummy points:
    range: [47800, 525000]  total: 7.93e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~dist_im_dc
Model depends on external covariate 'dist_im_dc'
Covariates provided:
    dist_im_dc: im

Fitted trend coefficients:
  (Intercept)    dist_im_dc 
-12.634712201  -0.000908062 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -12.634712201 6.202134e-02 -12.756271787 -1.251315e+01   ***
dist_im_dc   -0.000908062 4.703797e-05  -0.001000255 -8.158692e-04   ***
                  Zval
(Intercept) -203.71558
dist_im_dc   -19.30487

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)    dist_im_dc 
-12.634712201  -0.000908062 

Fitted exp(theta):
 (Intercept)   dist_im_dc 
3.256973e-06 9.990924e-01 
Code
stations_proj_nyc <- st_transform(stations_nyc, 32618)
docks_proj_nyc <- st_transform(docks_nyc, 32618)
boundary_proj_nyc <- st_transform(hull_buffered_wgs84_nyc, 32618)

boundary_owin_nyc <- as.owin(boundary_proj_nyc)
dock_coords_nyc <- st_coordinates(docks_proj_nyc)
dock_ppp_nyc <- ppp(x = dock_coords_nyc[,1],
                y = dock_coords_nyc[,2],
                window = boundary_owin_nyc)

station_coords_nyc <- st_coordinates(stations_proj_nyc)
station_ppp_nyc <- ppp(x = station_coords_nyc[,1],
                   y = station_coords_nyc[,2],
                   window = boundary_owin_nyc)
Code
dist_im_nyc <- distmap(station_ppp_nyc)
Code
m0_nyc <- ppm(dock_ppp_nyc ~ 1)
m1_nyc <- ppm(dock_ppp_nyc ~ dist_im_nyc)

anova(m0_nyc, m1_nyc, test = "LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~dist_im_nyc    Poisson
  Npar Df Deviance  Pr(>Chi)    
1    1                          
2    2  1   420.63 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
summary(m1_nyc)
Point process model
Fitted to data: dock_ppp_nyc
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_nyc ~ dist_im_nyc)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  2303 points
Average intensity 7.85e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 372 vertices
enclosing rectangle: [576179.5, 597719.7] x [4495675, 4527093] units
                     (21540 x 31420 units)
Window area = 293255000 square units
Fraction of frame area: 0.433

Dummy quadrature points:
     100 x 100 grid of dummy points, plus 4 corner points
     dummy spacing: 215.4018 x 314.1755 units

Original dummy parameters: =
Planar point pattern:  4449 points
Average intensity 1.52e-05 points per square unit
Window: polygonal boundary
single connected closed polygon with 372 vertices
enclosing rectangle: [576179.5, 597719.7] x [4495675, 4527093] units
                     (21540 x 31420 units)
Window area = 293255000 square units
Fraction of frame area: 0.433
Quadrature weights:
     (counting weights based on 100 x 100 array of rectangular tiles)
All weights:
    range: [8790, 67700]    total: 2.93e+08
Weights on data points:
    range: [11300, 33800]   total: 69200000
Weights on dummy points:
    range: [8790, 67700]    total: 2.24e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~dist_im_nyc
Model depends on external covariate 'dist_im_nyc'
Covariates provided:
    dist_im_nyc: im

Fitted trend coefficients:
  (Intercept)   dist_im_nyc 
-11.136186650  -0.001217601 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -11.136186650 3.552925e-02 -11.205822694 -11.066550606   ***
dist_im_nyc  -0.001217601 6.663248e-05  -0.001348198  -0.001087004   ***
                  Zval
(Intercept) -313.43717
dist_im_nyc  -18.27339

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)   dist_im_nyc 
-11.136186650  -0.001217601 

Fitted exp(theta):
 (Intercept)  dist_im_nyc 
1.457524e-05 9.987831e-01 
Code
stations_proj_bos <- st_transform(stations_bos, 32618)
docks_proj_bos <- st_transform(docks_bos, 32618)
boundary_proj_bos <- st_transform(hull_buffered_wgs84_bos, 32618)

boundary_owin_bos <- as.owin(boundary_proj_bos)
dock_coords_bos <- st_coordinates(docks_proj_bos)
dock_ppp_bos <- ppp(x = dock_coords_bos[,1],
                y = dock_coords_bos[,2],
                window = boundary_owin_bos)

station_coords_bos <- st_coordinates(stations_proj_bos)
station_ppp_bos <- ppp(x = station_coords_bos[,1],
                   y = station_coords_bos[,2],
                   window = boundary_owin_bos)
Code
dist_im_bos <- distmap(station_ppp_bos)
Code
m0_bos <- ppm(dock_ppp_bos ~ 1)
m1_bos <- ppm(dock_ppp_bos ~ dist_im_bos)

anova(m0_bos, m1_bos, test = "LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~dist_im_bos    Poisson
  Npar Df Deviance  Pr(>Chi)    
1    1                          
2    2  1   115.03 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
summary(m1_bos)
Point process model
Fitted to data: dock_ppp_bos
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_bos ~ dist_im_bos)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  320 points
Average intensity 2.01e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 448 vertices
enclosing rectangle: [813201.2, 830538] x [4687280, 4704924] units
                     (17340 x 17640 units)
Window area = 158844000 square units
Fraction of frame area: 0.519

Dummy quadrature points:
     64 x 64 grid of dummy points, plus 4 corner points
     dummy spacing: 270.8883 x 275.6815 units

Original dummy parameters: =
Planar point pattern:  2206 points
Average intensity 1.39e-05 points per square unit
Window: polygonal boundary
single connected closed polygon with 448 vertices
enclosing rectangle: [813201.2, 830538] x [4687280, 4704924] units
                     (17340 x 17640 units)
Window area = 158844000 square units
Fraction of frame area: 0.519
Quadrature weights:
     (counting weights based on 64 x 64 array of rectangular tiles)
All weights:
    range: [9790, 74700]    total: 1.58e+08
Weights on data points:
    range: [18700, 37300]   total: 10900000
Weights on dummy points:
    range: [9790, 74700]    total: 1.48e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~dist_im_bos
Model depends on external covariate 'dist_im_bos'
Covariates provided:
    dist_im_bos: im

Fitted trend coefficients:
  (Intercept)   dist_im_bos 
-1.226904e+01 -9.882943e-04 

                 Estimate         S.E.      CI95.lo       CI95.hi Ztest
(Intercept) -1.226904e+01 0.0933578252 -12.45202036 -1.208606e+01   ***
dist_im_bos -9.882943e-04 0.0001113262  -0.00120649 -7.700989e-04   ***
                   Zval
(Intercept) -131.419539
dist_im_bos   -8.877461

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)   dist_im_bos 
-1.226904e+01 -9.882943e-04 

Fitted exp(theta):
 (Intercept)  dist_im_bos 
4.694859e-06 9.990122e-01 
Code
stations_proj_sf <- st_transform(stations_sf, 32618)
docks_proj_sf <- st_transform(docks_sf, 32618)
boundary_proj_sf <- st_transform(hull_buffered_wgs84_sf, 32618)

boundary_owin_sf <- as.owin(boundary_proj_sf)
dock_coords_sf <- st_coordinates(docks_proj_sf)
dock_ppp_sf <- ppp(x = dock_coords_sf[,1],
                y = dock_coords_sf[,2],
                window = boundary_owin_sf)

station_coords_sf <- st_coordinates(stations_proj_sf)
station_ppp_sf <- ppp(x = station_coords_sf[,1],
                   y = station_coords_sf[,2],
                   window = boundary_owin_sf)
Code
dist_im_sf <- distmap(station_ppp_sf)
Code
m0_sf <- ppm(dock_ppp_sf ~ 1)
m1_sf <- ppm(dock_ppp_sf ~ dist_im_sf)

anova(m0_sf, m1_sf, test = "LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~dist_im_sf     Poisson
  Npar Df Deviance Pr(>Chi)   
1    1                        
2    2  1    9.337 0.002246 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
summary(m1_sf)
Point process model
Fitted to data: dock_ppp_sf
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_sf ~ dist_im_sf)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  107 points
Average intensity 8.79e-07 points per square unit
Window: polygonal boundary
single connected closed polygon with 521 vertices
enclosing rectangle: [-3757682, -3743953] x [5411973, 5426315] units
                     (13730 x 14340 units)
Window area = 121743000 square units
Fraction of frame area: 0.618

Dummy quadrature points:
     32 x 32 grid of dummy points, plus 4 corner points
     dummy spacing: 429.0211 x 448.1738 units

Original dummy parameters: =
Planar point pattern:  696 points
Average intensity 5.72e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 521 vertices
enclosing rectangle: [-3757682, -3743953] x [5411973, 5426315] units
                     (13730 x 14340 units)
Window area = 121743000 square units
Fraction of frame area: 0.618
Quadrature weights:
     (counting weights based on 32 x 32 array of rectangular tiles)
All weights:
    range: [9630, 192000]   total: 1.22e+08
Weights on data points:
    range: [48100, 96100]   total: 8910000
Weights on dummy points:
    range: [9630, 192000]   total: 1.13e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~dist_im_sf
Model depends on external covariate 'dist_im_sf'
Covariates provided:
    dist_im_sf: im

Fitted trend coefficients:
  (Intercept)    dist_im_sf 
-1.359688e+01 -2.908083e-04 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -1.359688e+01 0.1463608265 -1.388374e+01 -1.331002e+01   ***
dist_im_sf  -2.908083e-04 0.0001052747 -4.971429e-04 -8.447372e-05    **
                  Zval
(Intercept) -92.899709
dist_im_sf   -2.762376

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)    dist_im_sf 
-1.359688e+01 -2.908083e-04 

Fitted exp(theta):
 (Intercept)   dist_im_sf 
1.244374e-06 9.997092e-01 
Code
stations_proj_chi <- st_transform(stations_chi, 32618)
docks_proj_chi <- st_transform(docks_chi, 32618)
boundary_proj_chi <- st_transform(hull_buffered_wgs84_chi, 32618)

boundary_owin_chi <- as.owin(boundary_proj_chi)
dock_coords_chi <- st_coordinates(docks_proj_chi)
dock_ppp_chi <- ppp(x = dock_coords_chi[,1],
                y = dock_coords_chi[,2],
                window = boundary_owin_chi)

station_coords_chi <- st_coordinates(stations_proj_chi)
station_ppp_chi <- ppp(x = station_coords_chi[,1],
                   y = station_coords_chi[,2],
                   window = boundary_owin_chi)
Code
dist_im_chi <- distmap(station_ppp_chi)
Code
m0_chi <- ppm(dock_ppp_chi ~ 1)
m1_chi <- ppm(dock_ppp_chi ~ dist_im_chi)

anova(m0_chi, m1_chi, test = "LR")
Analysis of Deviance Table

Model 1: ~1      Poisson
Model 2: ~dist_im_chi    Poisson
  Npar Df Deviance  Pr(>Chi)    
1    1                          
2    2  1   300.39 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
summary(m1_chi)
Point process model
Fitted to data: dock_ppp_chi
Fitting method: maximum likelihood (Berman-Turner approximation)
Model was fitted using glm()
Algorithm converged
Call:
ppm.formula(Q = dock_ppp_chi ~ dist_im_chi)
Edge correction: "border"
    [border correction distance r = 0 ]
--------------------------------------------------------------------------------
Quadrature scheme (Berman-Turner) = data + dummy + weights

Data pattern:
Planar point pattern:  905 points
Average intensity 1.09e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 212 vertices
enclosing rectangle: [-566382.1, -541020.8] x [4685608, 4737799] units
                     (25360 x 52190 units)
Window area = 833001000 square units
Fraction of frame area: 0.629

Dummy quadrature points:
     70 x 70 grid of dummy points, plus 4 corner points
     dummy spacing: 362.3043 x 745.5840 units

Original dummy parameters: =
Planar point pattern:  3156 points
Average intensity 3.79e-06 points per square unit
Window: polygonal boundary
single connected closed polygon with 212 vertices
enclosing rectangle: [-566382.1, -541020.8] x [4685608, 4737799] units
                     (25360 x 52190 units)
Window area = 833001000 square units
Fraction of frame area: 0.629
Quadrature weights:
     (counting weights based on 70 x 70 array of rectangular tiles)
All weights:
    range: [24600, 270000]  total: 8.32e+08
Weights on data points:
    range: [24600, 135000]  total: 1.07e+08
Weights on dummy points:
    range: [24600, 270000]  total: 7.25e+08
--------------------------------------------------------------------------------
FITTED :

Nonstationary Poisson process

---- Intensity: ----

Log intensity: ~dist_im_chi
Model depends on external covariate 'dist_im_chi'
Covariates provided:
    dist_im_chi: im

Fitted trend coefficients:
  (Intercept)   dist_im_chi 
-1.285773e+01 -7.640408e-04 

                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
(Intercept) -1.285773e+01 5.668357e-02 -1.296882e+01 -1.274663e+01   ***
dist_im_chi -7.640408e-04 4.955449e-05 -8.611659e-04 -6.669158e-04   ***
                 Zval
(Intercept) -226.8334
dist_im_chi  -15.4182

----------- gory details -----

Fitted regular parameters (theta):
  (Intercept)   dist_im_chi 
-1.285773e+01 -7.640408e-04 

Fitted exp(theta):
 (Intercept)  dist_im_chi 
2.605915e-06 9.992363e-01 

Correlation Function Results against Train Station Locations

Metropolitan Area Docks dist_im coefficient Z-value Deviance diff
Washington, DC \(774\) \(-9.08 \times 10^{-4}\) \(-19.30\) \(584.99\)
New York, NY \(2,303\) \(-1.22 \times 10^{-3}\) \(-18.27\) \(420.63\)
Boston, MA \(320\) \(-9.88 \times 10^{-4}\) \(-8.88\) \(115.03\)
San Francisco, CA \(107\) \(-2.91 \times 10^{-04}\) \(-2.76\) \(9.337\)
Chicago, IL \(909\) \(-7.70 \times 10^{-4}\) \(-15.53\) \(305.55\)

As we saw with the population density tests, all five null hypotheses are rejected by this process. The extent of how strong the coefficient representing distance to the nearest train station varies in a similar way to the population density coefficient, with the same cities experiencing roughly the same correlation. If we assume that population density and train station locations are heavily correlated, these results are heavily expected, so simply performing this test was not sufficient in isolating the presence of public transit.

Monte Carlo Simulations

Similar to our other hypothesis test, running Monte Carlo simulations will help us better understand how real data compares to data that we can generate under specified conditions. Unlike the previous section, we will use this to distinguish between systems that happen to place docks close to train stations by nature of following population density from those that intentionally seek out locations with more or less access to public transit.

Therefore, in these 999 Monte Carlo simulations for each city, we will generate simulations that place docks proportionally to population density. We can then compare the coefficients for distance to the nearest train station, seeing if the real data has a stronger dependence on this variable than data with a bias towards denser areas. The process and results of these tests are below.

Code
stations_proj_dc <- st_transform(stations_dc, 32618)
station_coords_dc <- st_coordinates(stations_proj_dc)

station_ppp_dc <- ppp(
  x = station_coords_dc[,1],
  y = station_coords_dc[,2],
  window = boundary_owin_dc
)
Code
obs_dist_dc <- nndist(dock_ppp_dc, station_ppp_dc)
Code
obs_mean_dist_dc <- mean(obs_dist_dc)


nsim <- 999
sim_mean_dist_dc <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_dc <- rpoint(
    n = dock_ppp_dc$n,
    f = pop_im_dc,
    win = boundary_owin_dc
  )
  sim_dist <- nndist(sim_ppp_dc, station_ppp_dc)
  sim_mean_dist_dc[i] <- mean(sim_dist)
}
Code
p_value_dc <- (sum(sim_mean_dist_dc <= obs_mean_dist_dc) + 1) / (nsim + 1)

x_min <- min(sim_mean_dist_dc, obs_mean_dist_dc)
x_max <- max(sim_mean_dist_dc, obs_mean_dist_dc)

hist(
  sim_mean_dist_dc,
  main = "Washington, DC: Monte Carlo Distribution of\nDistance from Docks to Nearest Train Station",
  xlab = "Simulated Mean Distance",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(x_min, x_max)
)

abline(v = obs_mean_dist_dc, col = "red", lwd = 3)

Code
p_value_dc
[1] 0.001
Code
stations_proj_nyc <- st_transform(stations_nyc, 32618)
station_coords_nyc <- st_coordinates(stations_proj_nyc)

station_ppp_nyc <- ppp(
  x = station_coords_nyc[,1],
  y = station_coords_nyc[,2],
  window = boundary_owin_nyc
)
Code
obs_dist_nyc <- nndist(dock_ppp_nyc, station_ppp_nyc)
Code
obs_mean_dist_nyc <- mean(obs_dist_nyc)


nsim <- 999
sim_mean_dist_nyc <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_nyc <- rpoint(
    n = dock_ppp_nyc$n,
    f = pop_im_nyc,
    win = boundary_owin_nyc
  )
  sim_dist <- nndist(sim_ppp_nyc, station_ppp_nyc)
  sim_mean_dist_nyc[i] <- mean(sim_dist)
}
Code
p_value_nyc <- (sum(sim_mean_dist_nyc <= obs_mean_dist_nyc) + 1) / (nsim + 1)

x_min <- min(sim_mean_dist_nyc, obs_mean_dist_nyc)
x_max <- max(sim_mean_dist_nyc, obs_mean_dist_nyc)

hist(
  sim_mean_dist_nyc,
  main = "New York, NY: Monte Carlo Distribution of\nDistance from Docks to Nearest Train Station",
  xlab = "Simulated Mean Distance",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(x_min, x_max)
)

abline(v = obs_mean_dist_nyc, col = "red", lwd = 3)

Code
p_value_nyc
[1] 1
Code
stations_proj_bos <- st_transform(stations_bos, 32618)
station_coords_bos <- st_coordinates(stations_proj_bos)

station_ppp_bos <- ppp(
  x = station_coords_bos[,1],
  y = station_coords_bos[,2],
  window = boundary_owin_bos
)
Code
obs_dist_bos <- nndist(dock_ppp_bos, station_ppp_bos)
Code
obs_mean_dist_bos <- mean(obs_dist_bos)


nsim <- 999
sim_mean_dist_bos <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_bos <- rpoint(
    n = dock_ppp_bos$n,
    f = pop_im_bos,
    win = boundary_owin_bos
  )
  sim_dist <- nndist(sim_ppp_bos, station_ppp_bos)
  sim_mean_dist_bos[i] <- mean(sim_dist)
}
Code
p_value_bos <- (sum(sim_mean_dist_bos <= obs_mean_dist_bos) + 1) / (nsim + 1)

x_min <- min(sim_mean_dist_bos, obs_mean_dist_bos)
x_max <- max(sim_mean_dist_bos, obs_mean_dist_bos)

hist(
  sim_mean_dist_bos,
  main = "Boston, MA: Monte Carlo Distribution of\nDistance from Docks to Nearest Train Station",
  xlab = "Simulated Mean Distance",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(x_min, x_max)
)

abline(v = obs_mean_dist_bos, col = "red", lwd = 3)

Code
p_value_bos
[1] 0.855
Code
stations_proj_sf <- st_transform(stations_sf, 32618)
station_coords_sf <- st_coordinates(stations_proj_sf)

station_ppp_sf <- ppp(
  x = station_coords_sf[,1],
  y = station_coords_sf[,2],
  window = boundary_owin_sf
)
Code
obs_dist_sf <- nndist(dock_ppp_sf, station_ppp_sf)
Code
obs_mean_dist_sf <- mean(obs_dist_sf)


nsim <- 999
sim_mean_dist_sf <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_sf <- rpoint(
    n = dock_ppp_sf$n,
    f = pop_im_sf,
    win = boundary_owin_sf
  )
  sim_dist <- nndist(sim_ppp_sf, station_ppp_sf)
  sim_mean_dist_sf[i] <- mean(sim_dist)
}
Code
p_value_sf <- (sum(sim_mean_dist_sf <= obs_mean_dist_sf) + 1) / (nsim + 1)

x_min <- min(sim_mean_dist_sf, obs_mean_dist_sf)
x_max <- max(sim_mean_dist_sf, obs_mean_dist_sf)

hist(
  sim_mean_dist_sf,
  main = "San Francisco, CA: Monte Carlo Distribution of\nDistance from Docks to Nearest Train Station",
  xlab = "Simulated Mean Distance",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(x_min, x_max)
)

abline(v = obs_mean_dist_sf, col = "red", lwd = 3)

Code
p_value_sf
[1] 0.63
Code
stations_proj_chi <- st_transform(stations_chi, 32618)
station_coords_chi <- st_coordinates(stations_proj_chi)

station_ppp_chi <- ppp(
  x = station_coords_chi[,1],
  y = station_coords_chi[,2],
  window = boundary_owin_chi
)
Code
obs_dist_chi <- nndist(dock_ppp_chi, station_ppp_chi)
Code
obs_mean_dist_chi <- mean(obs_dist_chi)


nsim <- 999
sim_mean_dist_chi <- numeric(nsim)

for (i in 1:nsim) {
  sim_ppp_chi <- rpoint(
    n = dock_ppp_chi$n,
    f = pop_im_chi,
    win = boundary_owin_chi
  )
  sim_dist <- nndist(sim_ppp_chi, station_ppp_chi)
  sim_mean_dist_chi[i] <- mean(sim_dist)
}
Code
p_value_chi <- (sum(sim_mean_dist_chi <= obs_mean_dist_chi) + 1) / (nsim + 1)

x_min <- min(sim_mean_dist_chi, obs_mean_dist_chi)
x_max <- max(sim_mean_dist_chi, obs_mean_dist_chi)

hist(
  sim_mean_dist_chi,
  main = "Chicago, IL: Monte Carlo Distribution of\nDistance from Docks to Nearest Train Station",
  xlab = "Simulated Mean Distance",
  breaks = 30,
  col = "lightgray",
  border = "white",
  xlim = c(x_min, x_max)
)

abline(v = obs_mean_dist_chi, col = "red", lwd = 3)

Code
p_value_chi
[1] 1

Monte Carlo Simulation Results against Train Station Locations

Metropolitan Area p-value
Washington, DC \(0.001\)
New York, NY \(1\)
Boston, MA \(0.817\)
San Francisco, CA \(0.642\)
Chicago, IL \(1\)

These results provide the most evidence for diverging strategies for deploying bikeshare docks. We’ve eliminated population density as a factor by simulating placements that are influenced by population density, a phenomenon confirmed by the first set of hypothesis tests.

In Washington, we can see that simulations give us bikeshare docks that are further away from nearby train stations than the real-life case. Interestingly, this is entirely flipped for New York and Chicago. This tells us that the deployment strategy in Washington deliberately favors those with access to existing public transit, while New York and Chicago provide bikeshare access to areas with less train service.

Boston and San Francisco produced less decisive results, indicating that the presence of public transit may have been less of a consideration for designing those systems. We cannot definitively say that neighborhoods with abundant or lacking public transit are given greater bikeshare access than expected.

Discussion

Based on the results obtained above, we can conclude that there are multiply strategies for deploying bikeshare docks that result in successful systems with high usage. All five of these cities have far outpaced the rest of the country in producing heavily adopted bikeshare options. What this analysis provides is some insight into how organizers deal with conflicting priorities.

On one hand, bikeshare docks provide a useful supplement to people that prefer car-free travel, who may be using it as an intermediary between longer-distance public transit. On the other hand, providing affordable transportation to those without other car-free options is certainly an appealing prospect. In Washington, we see a system that prioritizes the former, with bikeshare docks aligning strongly with train stations. In New York and Chicago, we see the opposite approach, with bikeshare options following population density to provide widespread access regardless of existing public transit coverage.

Future analysis could be done by finding ways to better isolate specific variables, as well as identifying other potential influences on bikeshare dock placements. Additionally, including cities that have been less successful in implementing bikeshare systems with high usage could provide additional insight into what makes these systems work. Finally, obtaining data on ridership for each dock would be an interesting layer of analysis. To what extent do docks closer to population centers and train stations receive higher usage than the rest?

References

Footnotes

  1. “Docked bikeshare trips – October 2025,” Bureau of Transportation Statistics, https://www.bts.gov/newsroom/docked-bikeshare-trips-october-2025 (accessed Dec. 15, 2025).↩︎

  2. S. Banerjee, Md. M. Kabir, N. K. Khadem, and C. Chavis, “Optimal locations for bikeshare stations: A new GIS based spatial approach,” Transportation Research Interdisciplinary Perspectives, vol. 4, p. 100101, Mar. 2020. doi:10.1016/j.trip.2020.100101↩︎

  3. Z. Dehdari Ebrahimi, M. Momenitabar, A. A. Nasri, and J. Mattson, “Using a GIS-based spatial approach to determine the optimal locations of bikeshare stations: The case of Washington D.C,” Transport Policy, vol. 127, pp. 48–60, Oct. 2022. doi:10.1016/j.tranpol.2022.08.008↩︎